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SIGNIFICANCE  AND  EXPLANATION 


Much  of  the  literature  dealing  with  system  failures  assumes  that 
individual  subsystems  or  components  are  stochastically  independent.  In  this 
report,  some  models  that  have  been  used  for  analysing  dependent  failures  are 
examined  and  one  new^ model  is  proposed.  These  models  are  of  particular 
interest  in  the  probabilistic  risk  assessment  of  nuclear  reactors. 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  author  of  this  report. 


STOCHASTIC  MODELS  FOR  COMMON  FAILURES 


OF  COMPONENTS 

Bernard  Harris* 


1.  Introduction 

In  the  probabilistic  modeling  of  problems  in  systems  reliability,  the 
possibility  that  failures  of  components  may  be  stochastically  dependent  is 
often  neglected.  In  some  areas  of  application,  such  as  the  safety  of  nu¬ 
clear  reactors,  the  treatment  of  such  dependent  failures  has  been  the  sub¬ 
ject  of  substantial  controversy.  This  report  is  primarily  motivated  by  the 
treatment  of  dependent  failures  in  the  nuclear  reliability  literature. 
Nevertheless,  the  models  described  herein  have  quite  general'  applicability. 

He  begin  this  discussion  with  a  summary  of  the  various  modes  of  depen¬ 
dent  failures,  as  classified  in  the  nuclear  reliability  literature.  Unfor¬ 
tunately,  the  terminology  is  not  consistent  and  the  same  terms  are  defined 
differently  by  the  various  writers.  Further,  the  same  mathematical  model 
may  be  an  appropriate  description  of  more  than  one  type  of  dependent  failure. 

Specifically,  a  common  mode  failure  is  the  simultaneous  failure  of  more 
than  one  component.  In  the  engineering  literature,  (see,  for  example,  the  PRA 
Procedures  Guide  [17])  it  Is  assumed  that  the  failures  are  not  stochastically 
Independent.  In  the  treatment  that  follows,  no  assumption  about  the  stochastic 
Independence  or  lack  thereof  is  made.  This  Is  a  mathematical  convenience, 
since  stochastic  Independence  is  a  limiting  case  of  stochastic  dependence. 


‘Appeared  as  a  University  of  Wisconsin-Madison,  Department  of  Statistics 
Technical  Report  #727. 

Sponsored  by  the  United  States  Army  under  Contract  No.  DAAG29-80-C-0041. 


.  v*.% . 


Additional  detailed  general  discussions  of  common  mode  failures  and 
their  analyses  may  be  found  in  Edwards  and  Watson  [7]  and  the  Deutsche 
Rislko  Studie  -  Kernkraftwerke  [10]. 

A  failure  of  a  component  or  subsystem  is  said  to  be  a  propagating 
failure  when  the  failure  changes  the  operating  conditions,  environments  or 
requirements  in  such  a  way  as  to  cause  the  failure  of  other  equipment.  Here 
we  will  be  Interpreting  this  definition  in  the  manner  of  a  classical  mathe¬ 
matical  model  of  H.  E.  Daniels  [4],  which  may  be  described  as  follows.  En¬ 
vision  a  cable  consisting  of  m  wires  Intertwined.  This  cable  is  supporting 
a  load  and  the  load  Is  distributed  among  the  m  wires.  If  1  <k<m  wires 
break,  then  the  load  is  redistributed  among  the  remaining  m-k  wires,  in¬ 
creasing  the  chance  that  they  will  rupture.  It  Is  in  this  sense  that  one  can 
model  propagating  failures,  that  is,  the  failure  of  some  components  increases 
the  stress  on  others.  P.  K.  Sen  [14,15]  has  studied  statistical  Inference 
for  the  model.  A  failure  Is  said  to  be  a  common  cause  failure  if  more  than  one 
component  falls  due  to  a  single  cause  (usually  assumed  to  be  external  to  the 
operating  conditions  of  the  equipment).  Such  common  causes  may  be  earthquakes, 
fires,  floods,  volcanic  eruptions,  or  lightning  strikes. 

Consequently,  in  modeling  common  cause  failures,  it  is  desirable  to 
Introduce  point  processes  for  initiating  events.  Physically,  an  initiating 
event  Is  to  be  regarded  as  the  external  occurrence  such  as  a  flood,  earthquake, 
power  outage,  or  fire. which  can  cause  the  failure  of  several  components 
simultaneously,  due  to  the  environmental  stresses  occasioned  by  its  occurrence. 

Another  cause  of  simultaneous  failures  of  several  components  occurs  when 
one  device  has  several  functions,  so  that  its  failure  prevents  each  of  these 
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Individual  functions  from  operating.  Such  might  be  the  case  if  two  cooling 
tanks  were  fed  by  the  same  water  supply  pipe.  These  types  of  common  mode 
failures  are  known  as  shared-equipment  dependencies  and  should  be  detectable 
by  an  examination  of  the  logic  diagram  of  the  system.  Such  possible  common 
mode  failures,  being  dependent  on  the  engineering  design  of  the  system,  can  be 
avoided  by  proper  design  and  should  not  be  of  concern  for  this  study. 

Another  possibility  which  may  call  for  dependent  modeling  of  component 
lifetimes  Is  the  presence  of  standby  components.  Such  a  component  Is  called 
Into  use  when  a  specified  component  or  specified  components  have  failed. 

Thus,  it  is  plausible  that  a  failure  may  be  detected  {or  may  occur,  in  the 
case  of  demands)  only  after  these  other  components  have  failed.  Consequently, 
the  conditional  waiting  time  until  a  failure  in  the  standby  component  Is  ob¬ 
served  is  different  from  the  waiting  time  until  failure  if  it  were  in  primary 
(non-standby)  usage. 

In  Section  2,  the  square  root  bounding  method  is  discussed.  This  method 
was  Introduced  in  WASH-1400  [16]  and  has  been  severely  criticized.  The  beta 
factor  model  Is  described  In  Section  3.  The  common  load  model  of  Mankamo 
[12,13]  is  described  In  Section  4.  In  Section  5,  the  binomial  failure  model 
(see  Veseley  [18]) Is  Introduced  and  In  Section  6,  a  shock  model  of  Apostolakis 
[1]  Is  defined.  A  model  proposed  by  the  author  is  presented  in  Section  7. 

The  square  root  bounding  method,  the  beta  factor  model  and  the  binomial  failure 
rate  model  are  compared  In  Fleming  and  Raabe  [9],  Concluding  remarks  are 
given  In  Section  8. 


The  Square  Root  Bounding  Method. 


The  highly  controversial  method  discussed  in  this  section  was  introduced 
In  WASH-1400  [16,  Appendix  IV]. 

Let  A,,  i  *  l,2,...,m  be  a  finite  sequence  of  events.  The  authors  of 

m 

WASH-1400  wished  to  obtain  a  convenient  approximation  to  P( f  ^  A  - ) .  This 

1*1  1 

approximation  should  be  sufficiently  simple  to  permit  statistical  estimation 
or  to  facilitate  computation,  or  capable  of  determination  from  prior  knowledge 
of  the  properties  of  the  events  A^  •  or  from  engineering  judgment. 

In  WASH-1400,  because  of  the  Intended  application,  it  is  assumed  that- 
the  events  A^  denote  failures  of  components  or  subsystems. 

We  now  describe  the  square  root  bounding  method.  Trivally,  we  have 


m 

Pt^Aj  s  P(^AJ,  1  sjsm  (2.1) 

i-i  m 1 


Let  C-j  and  C2  be  arbitrary  subsets  of  {l,2,...,m)  .  Then,  in  view  of 
the  Intended  application,  we  assume 


Thus,  (2.2)  expresses  the  assumption  that  the  failure  of  some  components  will 

not  decrease  the  probability  of  the  failure  of  other  components. 

One  employs  (2.1)  and  (2.2)  to  obtain  upper  and  lower  bounds  to 
»  mm 

P(  riA^).  We  denote  these  bounds  by  P(nA.)  and  P(  n  A.)  respectively. 

1-1  1  1-1  1  1-1  1 
Thus, 


* 

* 
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III  Ml  1*1 

P(OA.)  s  P(0  A.)  s7(H  a.) 
~  1-1  1  1=1  1  1=1  1 


(2.3) 


and  the  approximation  proposed  In  WASH- 1400  Is 


=  WoaJp^a.))55 , 

1-1  1  i*i  1  1=1  1 


(2.4) 


hence  the  name  "square  root  bounding  method." 

The  precise  selection  of  the  bounds  appears  to  be  not  completely  pre¬ 
scribed  by  WASH-1400  and  subsequent  writing  on  this  procedure.  However,  the 
following  appears  to  be  the  most  commonly  employed  choice. 


Example  2.1 


From  (2.1),  It  follows  by  Induction  that 


PPAJ  s  min  P(A.) 
1-1  1  1  1 


(2.5) 


To  get  a  lower  bound,  for  1  s  fc  s  m,  we  write 


P(pA,)  -  P(A1)P(A2!A1)P(A3|A1nA2)...P(Ak|piAi), 


Letting  C14  *  (j)  1  s  j  sk  and  C9i  -  {1 ,2,...,  j-l>,  t  » 1 ,2,...,k  -  1 , 
J  j-V 

Con  *  we  have,  from  (2.2),  P(A.(mA4)  2  P(AJ.  Hence 
cv  J  1=1  J 


m  m 

P(nAj  2  n  P ( Ai  ) . 
1-1  1  1-1  1 


(2.6) 


In  particular,  if  P(A^)  -  p,  1  3  l,2,...,m,  then 


•r%r  *•  •/  * 


.*•  O*'  J**  (*”  •'*  •  *  •  */  ’  » 


s. 

*  ‘  .  "a  *  a 


.  .  .  ."V> 

.V-V-W.v.y .V..V-V 
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pm  s  P(<~1  A,)  sp 
1-1  1 


P(Aa.)  =  p^m+1)/2  . 
1«1  1 


(2.7) 


(2.8) 


If,  In  a  system  of  m  components,  k  have  Identical  failure  distribu¬ 
tions  (they  need  not  be  physically  identical),  we  refer  to  these  components 
as  being  repeated  components . 


Example  2.2 


Consider  an  m-component  parallel  system  with  m  repeated 


components.  Let  A.,  be  the  event  that  the  ifc"  component  fails.  Then 
m  1 

P(^A4)  is  the  probability  that  the  system  fails  and  by  the  square  root 

1*1  1  ~  M? 

bounding  method,  we  obtain  (2.8).  In  particular,  for  m  =  2,  P(A1nA2)  =  p  '  . 

Me  subsequently  examine  this  special  case  In  substantially  more  detail. 

In  G.  T.  Edwards  and  I.  A.  Watson  [7],  a  modification  of  (2.3)  and  (2.4) 
for  k  of  m  systems  is  given.  This  modification  is  based  on  an  approxima¬ 
tion  to  (2.3)  and  (2.4)  which  may  be  derived  assuming  low  failure  probabilities 
and  employing  the  8- factor  method,  which  Is  discussed  In  Section  3. 

We  describe  this  for  2  of  3  systems. 

A  2  of  3  system  falls  If  two  or  more  components  fail.  If  the  failures 
of  components  are  independent  and  Identically  distributed  with  failure  prob¬ 
ability  p,  then  the  probability  that  the  system  falls  Is 


P  -  p3  +  3p2(l  -p). 


(2.9) 


1 .  *  -  a  «  . .  .  .*  .*  v  . 


V*.  *  •  v 

— *  «.  t  \  . 
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Under  the  reasonable  assumption  that  p  is  small. 


P  ~  3p2.  (2.10) 

In  Edwards  and  Watson,  this  is  taken  to  be  the  lower  bound  £,  however, 
as  is  evident  from  (2.9),  it  is  not  a  lower  bound.  The  reasoning  by  which  this 
is  taken  as  the  lower  bound  is  not  given  in  Edwards  and  Watson.  We  can  never¬ 
theless  justify  it  as  an  approximation  to  the  lower  bound  for  small  p  , 
using  some  simplified  mathematical  models  to  describe  common  failures.  One 
way  to  do  this  uses  the  beta  factor  model  and  will  be  treated  in  Section  3. 

The  upper  bound  given  by  Edwards  and  Watson  is  P  =  p  ,  Which  is  less  than 
(2.9)  for  1/2  <  p  <  1  and  less  than  (2.10)  for  p  >  1/3  . 

The  rationale  given  in  WASH-1400  for  the  square  root  bounding  method 
(2.4)  may  be  summarized  as  follows: 

Let  Fj((x)  be  the  log-normal  distribution  and  let  xa  be  the  solution 
(In  x)  of  Fv(x)  *  a.  Then  (x  x.  J*5  Is  the  median  of  the  log-normal 
distribution,  for  every  0  <  a  <  1.  In  WASH-1400,  [16,  App.  IV,  p.  19],  this 
Is  described  by  saying  that  "a  log-normal  was  used  with  its  median  positioned 
at  the  center  (geometric  midpoint)  of  the  range"*  In  Edwards  and  Watson 
[7,  p.  110],  "these  boundary  values  (i.e.,  (2.3))  define  the  range  in  which 
the  true  system  failure  probability  lies  and  in  the  WASH-1400  study  a  log¬ 
normal  probability  distribution  was  assumed  for  the  range  of  possible  values. 
Where  the  common-mode  failure  probability  was  not  predominant  in  a  system 
reliability  analysis  a  best  estimate  was  obtained  by  calculating  the  median 
of  the  log-normal  distribution.  This  is  the  geometric  mean  of  the  range." 
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Since  the  range  of  the  log-normal  distribution  is  (0,®),  the  above 
statement  does  not  have  a  precise  Interpretation  as  given.  If  it  is 
modified  as  follows. 


lim  (x  (x.  )p  =  M  , 
a-  0  a  ,_a 


then  the  median  M  is  characterizable  in  this  manner. 

However,  as  stated  in  WASH-1400,  the  assumption  that  the  upper  and  lower 
bound  of  the  failure  probabilities  should  be  presumed  to  be  symmetrically 
located  tail  probabilities  of  the  log-normal  distribution  is  a  completely 
arbitrary  assumption.  There  does  not  appear  to  be  any  logical  basis  for  such 
an  assumption,  other  than  the  mathematical  convenience  of  being  able  to  combine 
the  bounds  as  In  (2.4)  for  the  purpose  of  obtaining  a  single  value  "midway" 
between  the  two  bounds  in  a  well-defined  sense. 

The  use  of  the  log-normal  distribution  to  model  the  distribution  of  un¬ 
known  probabilities  Is  highly  questionable.  It  is  possible  that  a  specific 
Bayesian  model  with  a  prior  distribution  for  unknown  probabilities  and  range 
(0,1)  might  be  approximated  by  a  suitable  log-normal  distribution.  The 
question  of  the  errors  Introduced  by  such  an  approximation  would  then  be  a 
matter  for  sensitivity  analysis  and  will  not  be  specifically  examined  in 
this  report. 

The  Lewis  Report  [11]  was  highly  critical  of  the  square  root  bounding 
model.  For  the  purposes  of  this  report.  It  Is  worthwhile  to  summarize  the 
criticisms  given  In  the  Lewis  report.  The  square  root  bounding  method  is 
described  as  follows  therein. 


V w v  ...  v 

V  -.%•  v-  v-  -  •  -  *  SV  •  •  ■  '.-Wv  v  v  • 

- 


•  v'vv\ 
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The  true  system  Is  too  complex  to  calculate  a  failure  probability. 
Consequently  a  simple  model  Is  needed.  Let  M  denote  a  possible  model  and 
let  P(M)  denote  the  failure  probability  calculated  using  this  model  M  . 
Assume  that  the  probability  that  a  given  class  of  models  A  Is  correct  Is 
representable  by 

Q (A)  =  /  d  Q(M)  .  (2.11) 

A 

Then  the  failure  probability 

p  •  /  P(M)d  Q(M).  (2.12) 

or  the  mean  probability  with  respect  to  the  probability  distribution  Q (M)  . 
In  MASH- 1400,  Q(M)  Is  taken  to  be  the  log-normal  distribution.  Rather  than 
attempting  to  characterize  the  set  of  possible  models,  in  WASH-1400,  two 
models,  an  upper  bound  model  and  a  lower  bound  model  are  constructed.  These 
are  selected  subjectively,  presumably  using  engineering  judgment.  It  is 
further  assumed  that  these  two  models  are  symmetrically  situated,  resulting 
In  the  average  P  . 

"The  degree  of  arbitrariness  In  this  procedure  boggles  the  mind.  The 
lower  bound  gives  a  bound  which  is  so  low  as  to  be  absurd,  and  there  Is  no 
reason  to  believe  that  the  upper  bound  Is  In  any  sense  a  symmetrically  placed 
upper  bound.  Nor  Is  there  any  reason  to  believe  that  Q(M)  is  log-normal. 
The  results  are  very  sensitive  to  these  arbitrary  choices.” 

A  somewhat  similar  critique  of  the  square  root  bounding  method  is  given 
by  R.  G.  Easterling  [6]. 


Assume  that  there  Is  an  event  C  such  that  P ( n A2 1 C )  Is  the  upper 
bound  and  P(A^ nA2 1 C)  Is  the  lower  bound.  Then 

P(A, nA2)  *  (PtA^AglCWAjnA^lt))*5,  (2.13) 

Instead  of 

PtA^Ag)  =  P(A1nA2|C)P(C)  +  P(A1  nA2|C)P(C> ,  (2.14) 

a  particular  case  of  (2.12). 

Somewhat  more  generally*  let  C, ,  1  »  1 ,2,...,N  be  a  collection  of  such 
events.  We  suppose  that  the  C^'s  are  numerically  valued  and  approximately 
log-normally  distributed.  Let  C&  and  C95  be  the  lower  and  upper  S%  points 
respectively.  Then 

(P(A1nA2|C5)P(A1nA2|C95))*s  »  P(A,nA2)  (2.15) 

and  Is  asserted  by  R.  G.  Easterling  to  be  the  median  of  the  distribution  of 
P(A-| nAg I ) .  Easterling  notes  that  this  Is  not  P(A^nA2)  and  also  that 

J  P(A1nA2|Ci)P(Ci)  -  E{P(A1nA2|Ci)}  (2.16) 

Is  the  mean  of  the  distribution  ((2.14)  Is  of  course,  the  same  as  (2.12)  with 
P(M)  replaced  by  P(A^nA2fC1 )).  Easterling  further  notes  that  the  mean  of 
the  log-normal  distribution  Is  larger  than  the  median  of  the  log-normal  distri¬ 
bution. 


First  It  should  be  noted  that  (2.12)  and  (2.16)  are  quite  different 
assumptions  than  (2.4).  Specifically,  If  M  In  (2.12)  Is  log-normally  distri¬ 
buted,  this  places  little  restriction  on  the  distribution  of  P(M).  It  Is 
P(M)  and  not  Q(M)  that  Is  assumed  to  be  log-normally  distributed  In 
WASH-1400.  While  the  log-normal  distribution  provides  a  weak  justification 
for  (2.4),  one  may  still  regard  (2.4)  as  a  convenient  Interpolation  between  two 
presumed  extreme  values.  Thus  It  Is  of  substantially  greater  Interest  to 
ascertain  how  (2.4)  behaves  and  further  to  ascertain  when  It  Is  a  reasonable 
approximation. 

For  simplicity,  we  take  m  *  2.  Let  Xj,  X2  be  two  Identically  distri¬ 
buted  Bernoulli  random  variables  with  p(X-j,X2)  >  0,  where  p  denotes  the 
correlation  coefficient. 

Then 

PCX,  -  1,  X2  *  1}  pp(l  -p)  +  P2,  (2.17) 

PCX,  -  1}  -  P{Xg  =  1}  -  p  .  (2.18) 

The  condition  p  >  0  Is  equivalent  to  (2.2),  when  m  *  2  and  P(A^)  » 

P(^)  *  P.  This  can  be  seen  as  follows 

PCX,  -  1,  X2  -  1)  -  p2 
P  "  P(1  -  P) 

from  which,  letting  A,  ■  C X,  ■  1 > ,  A2  «  (X2  ■  1},  we  have 


V  •>  '.I 
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P{A1  |A2)  *  P{A2 1 A1 }  *  (pp(l  -  p)  +  pZ)/p  -p(l-p)+p%p 


with  equality.  If  and  only  If  p  =  0. 

The  square  root  bounding  method,  as  illustrated  in  (2.8)  gives 
estimate  p3^2  for  this  case. 

Consequently,  we  examine 


■v*  »> 


Hp(p)  =  (p2  +  pp(l  -  p))/p3/2  , 


In  particular,  let  a  and  B  be  two  designated  constants  with  a  < 
The  objective  is  to  determine  the  set 


m 


Dp(a,B)  *  {p|a  s  Hp(p)  s  B»  Osp  si}. 


For  p  *  0,  Hg(p)  -  p*5  <  1,  so  that  HQ(p)  sB,  0  <  p  <  1;  HQ(p)  ; 
2  -k 

whenever  p  s  a  .  Similarly,  H-j  { p)  *p,>l,  so  that  H^(p)  s  a, 
Hj(p)  s:  B  holds  whenever  p  2  1/B2.  For  0  <  p  <  1, 


Is  equivalent  to 


o  5  Hp(p)  s  B 


op1*  s  (1  -  p)p  +  p  s  Bp*5 . 


Let  u  ■  p1* 


(1  -  p)u*  -  Bu  +  p  s  0 


(1  -  p)u*  -  ctu  +  p  £  0 


S-  r  I  ,  J,  ,,,  j.  ,  . 


the 

(2.19) 

1,  8  >  1. 

(2.20) 

a  holds 
0  <  p  <  1 ; 


(2.21) 

(2.22) 
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Thus  (2.21)  holds  whenever 


—  12.23) 

2(1  -  p)  J 

and  (2.22)  holds  for  all  0  <  p  <  1  whenever  a2  s  4p(l  -p).  Otherwise 
(2.22)  holds  whenever 


0  <  p  s  (a  -  *42  -  4p(l  -p)  /2(1  -p))4 


(2.24) 


and 


((a  +  vi2-  4p(l  - p) )/2(1  -  p))2  s  p  <  1.  (2.25) 

In  practice,  one  will  often  take  a  *  1/0  and  values  of  8  suggested 
by  the  Intended  application  In  WASH-1400  are  /Tff  and  10.  These  are 
natural  due  to  the  interest  In  measuring  errors  to  orders  of  magnitude. 

We  can  summarize  these  results  for  0  = /To  and  0  *  10  as  follows: 

For  0  «  /ITT,  p  a:  .026,  (2.21)  and  (2.22)  are  satisfied  for 
Dp(a.B)  -  (/Tff  -  /10  -  4p(l  -  p) )/2(l  -p)  s  p  <  1). 

For  p  <  .026, 


op(o,0)  •  {(/nr  -  /10-4p(1-p))/2(l  -p)  s  p  s  (/T-  /rr-“TpTT-p)/2(l  -p)) 
u ((/n-  +  /n'-Tpri -p))/2(i .P))sp<  D)  . 
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For  8  *  10,  p  >  .0025, 

Dp(a,8)  *  {(10  -  A  00'-  4pIT  -  p75/2(l  -  p)sp<  1}. 

For  p  s  .0025, 

Dp(a,B)  »  {(10  -  /lW"'4pn':  p))/2(l  -  p)  *  p  *  (.1  -  A01  ^TT^T)/2(1  -  p) 

U  ((.1  +  AO 4pfl  -  p))/2(l  -  p))  s  p  <  1). 

For  very  small  values  of  p  ,  the  lower  level  of  Dp(a,B)  Is  approximately 
p//Tff  when  a  *  /I/10,  B  *  /TIT  and  p/10  when  a  ■  1/10,  B  *  10. 

It  Is  also  worthwhile  to  estimate  the  difference  between  the  two  quanti¬ 
ties,  that  Is,  consider 


Ap(p)  ■  P2  +  PP(1  -p)  -  P3/2 


for  small  values  of  p  and  small  values  of  p  .  Specifically,  let 
p  •  cpa,  0  s  a.  Then  It  Is  easily  seen  that  as  p  -*>  0  , 


Ap(p)  ~  { 


o+l 

cp 

(c-l)p 
_2 


3/2 


-P 


3/2 


0  s  a  <  1/2 
a  -  1/2,  c  f  1 
a  ■  1/2,  c  ■  1 
1/2  <  o 


(2.26) 


Finally,  note  that  the  square  root  bounding  method  yields  conservative 
estimates  whenever  Hp( p)  <  1. 


The  above  discussion  was  restricted  to  parallel  (redundant)  systems  of 
two  components.  This  can  be  extended  to  k  of  m  systems*  however*  at  a 
substantial  Increase  In  complexity,  which  may  serve  to  obscure  the  conclusions 

3.  The  Beta  Factor  Model 

The  beta  factor  model  Is  basically  a  parametrlzatlon  of  binomial  or 
Poisson  models  in  which  the  failures  are  divided  into  two  classes*  Individual 
and  common  failures.  0  denotes  the  expected  proportion  of  failures  which 
are  common  failures. 

Thus,  for  a  Poisson  process  with  Intensity  X  ,  we  let  X^  denote  the 
expected  number  of  Individual  failures  per  unit  time  and  let  X  denote  the 
expected  number  of  common  failures  per  unit  time.  Then 

X  •  Xi  +  Xc,  0  *  Xc/X,  0*0*1.  (3.1) 

A  description  of  the  beta  factor  model  Is  given  In  Edwards  and  Watson  [7], 

The  technique  Is  due  to  Fleming  [8]  and  is  utilized  In  Dhlllon  and  Proctor 
[5]. 

To  apply  the  beta  factor  model  to  the  life  testing  model  for  systems 
reliability,  one  may  proceed  as  follows: 

Let  be  Identically  distributed  random  variables  with 

P{Xt  a  x)  *  e*X\  x  >  0  .  (3.2) 

Xj,  1  ■  l,2,...,m  Is  to  be  Identified  as  the  waiting  time  to  failure  of  the 
1th  component  of  a  system. 
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Example  3.1 .  Consider  a  parallel  system  of  two  components.  If  Indepen¬ 

dent,  the  probability  that  the  system  does  not  fall  on  or  before  time  T  Is 

R(T)  «  2e"XT  -  e“2XT,  X  >  0,  T>0.  (3.3) 

Let  Pj(T)  be  the  probability  that  an  Individual  failure  of  a  specified 
component  does  not  occur  on  or  before  T  .  It  Is  assumed  that  Individual 
failures  are  Independent.  Let  Pq(T)  be  the  probability  that  a  common  failure 
does  not  occur  before  time  T  .  Then 

R(T)  -  Pc(T){(pf(T)|Pc(T))  +  (2Pj(T)(l  -  Pj(T))|Pc(T))},  (3.4) 

upon  assuming  that  the  Individual  failures  are  conditionally  Independent,  given 
that  no  common  failure  has  occurred.  In  Edwards  and  Watson  [7],  the  further 
simplifying  assumption  that  common  failures  and  Individual  failures  are  Inde¬ 
pendent  Is  made,  resulting  In 

R(T)  »  Pc(T){pf(T)  +  2Pj(T)(l  -  Pj(T))>  ,  (3.5) 

Now  using  the  beta  factor  model  and  simplifying,  we  get 

R(T)  -  Pc(T){2Pj(T)  -  Pj (T)) ,  (3.6) 

where 

BXT,  Pj(T)  »  e”0“B)XT  . 


PC(T)  -  e' 


(3.7) 


(3.8) 


R(T)  ■  2e“XT  -  e_2xT+BXT  . 

If  8*0,  that  Is,  there  are  no  common  failures,  then  (3.8)  reduces  to 
(3.3).  If  8*1,  that  Is,  all  failures  are  common  failures,  then  R(T)  »  e-X^, 
since  both  components  act  as  a  unit  (single  component). 

This  can  be  extended  to  more  complicated  systems  at  the  cost  of  In¬ 
creased  complexity.  A  simplified  treatment  Is  given  In  Edwards  and  Watson  [7], 
where  It  Is  assumed  that  the  only  common  failures  are  those  In  which  all  com¬ 
ponents  fall,  a  somewhat  stringent  assumption.  A  model  which  does  not  require 
this  assumption  Is  described  In  Section  5. 

In  the  engineering  literature.  It  is  customary  to  approximate  life  testing 
formulas  by  assuming  that  XT  •*  0,  In  which  case  (3.8)  Is  approximated  by 

R(T)  ~  1  -  BXT  .  (3.9) 

This  reasoning  Is  applied  to  the  square  root  bounding  model  by  Edwards 
and  Watson  [7],  In  particular  consider  a  2  of  3  system.  That  Is,  a 
system  which  operates  whenever  two  or  more  of  the  three  components  function. 
Then,  from  (3.5), 

R(T)  -  PC(T){3  pf(T)(l  -Pj(T))  +  pf(T)> 

-  e“BXT{3  e’2(1'B)XT)(l-e"(1“B)xT)  +  e'3(1“B)xT} 

-  3  e”(2“B)XT-3  e’(3"2B)xT  +  e‘(3'2B)XT 

.  3  e-(2-8)XT  _  2  e-(3-2B)XT  # 


(3.10) 
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Naturally,  for  8  *  0, 

R(T)  =  3e‘2XT  -  2e”3XT  ,  (3.11) 

In  agreement  with  (2.9)  for  p  *  e"XT  .  For  6  «  1, 

R(T)  *  3e*XT  -  2e"XT  *  e”XT  ,  (3.12) 

since  there  Is  effectively  only  one  component. 

Using  the  approximation  obtained  by  letting  XT  +  0  , 

R(T)  ~  3(1  -  (2-8)XT)  -  2(1  -  (3-2B)XT)  ~  1  -  8XT  (3.13) 

Note  that  for  8*0,  the  failure  probability,  1  -R(T),  does  not  have 
a  nontrivial  approximation  given  by  (3,13).  This  can  be  rectified  by  utilizing 
the  second  order  terms  In  (3.11),  obtaining 

R(T)  ~  1  +  3(4X2T2/2)  -  2(9X2T2/2)  *  1  -  3X2T2 

so  that 

1  -  R(T)  ~  3X2T2  , 

In  agreement  with  (2.10). 
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4.  The  Common  Load  Model 

T.  Mankamo  [12]  proposed  the  following  model.  Assume  that  the  m  com¬ 
ponents  have  Independent  and  Identically  distributed  random  resistances 

We  denote  the  probability  density  function  of  these  resistances 
by  fR(x).  A  random  stress  S  with  probability  density  function  g$(x)  occurs. 
Then  the  event  that  exactly  k  of  the  components  fall  simultaneously, 
k  •  l,2,...,m  Is  given  by 

{R[k]  <  S  *  R[k+1]}  *  (4,1) 

where  R^j  s  R^]  s  ...  s  R^  are  the  ordered  resistances.  S  Is  presumed 
to  be  Independent  of  R-j ,  Rg,...,!^  . 

A  given  component  Is  assumed  to  fail  whenever  R  <  S  .  Thus 

oo  y 

P{R  < S}  »  J#  J#  fR(x)g$(y)dxdy  .  (4.2) 

This  may  be  written 

jf  FR(y)gs(y)dy  =  E$(R)  ,  (4.3) 

where  FR(y)  Is  the  cumulative  distribution  function  of  the  resistance  and 
Ej(R)  denotes  the  expected  value  of  R  computed  with  respect  to  the  prob¬ 
ability  distribution  of  the  stress. 

It  follows  that  the  probability  that  k  components  fall  when  subjected 
to  the  random  stress  S  Is 
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P.(k)  -  jf(I)<fR(y)>kO  -  FR(y))m‘lt9s(y)dy  .  (4.4) 

Nankamo  Illustrates  this  model  under  the  assumption  that  both  R  and 
S  are  normally  distributed  and  also  under  the  assumption  that  both  R  and 
S  have  the  log-normal  distribution. 

For  the  normal  distribution  model  with  parameters  a^,  a|. 

Nankamo  proposed  the  quantity 

YRS  "  ^  +  Or/0!)"1  (4.5) 

as  a  measure  of  the  dependence  of  component  failures.  A  measure  based  on  the 
relative  size  of  the  two  variances  Is  logical.  If  o*/a|  Is  very  small, 
then  all  components  will  tend  to  fall  simultaneously  or  function  simultaneously 
after  being  subjected  to  the  random  stress  S  .  If  aj^/o|  Is  large,  then  the 
knowledge  that  a  given  component  has  failed  provides  little  Information  about 
the  failures  of  other  components.  The  particular  form  of  chosen  by 
Nankamo  has  a  range  0  s  Y^  s  1,  which  presumably  Is  found  to  be  intuitively 
useful . 

Nankamo  suggests  defining  a  parameter  n^  by 

Pm(k)  •  (Pm(D)"k.  (4.6) 

This  Is  an  appealing  parameterization,  since  It  provides  a  number  nk  which 
describes  the  "effective  redundancy,"  or  equivalently,  m-n^  describes  the 
loss  In  redundancy  due  to  common  failures. 
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Mankamo  says  that  the  common  load  model  Is  difficult  to  utilize  when 
failure  rates  rather  than  failures  on  demand  are  Involved.  He  suggests  de¬ 
fining  the  probability  of  a  common  failure  of  k  components  by 

\t  nk 

Pk(T)  -  0 -e‘XT)  ,  (4.7) 

where  nfc  Is  determined  by  (4.6).  The  customary  approximation  (3.9)  gives 

Pk(T)  ~  (XT)"*  .  (4.8) 

5.  The  Binomial  Failure  Rate  Model 

This  model  was  proposed  by  W.  E.  Vesely  [18].  An  extensive  discussion 
of  this  model  Is  given  by  C.  L.  Atwood  [2],  A  description  of  this  model 
follows. 

let  Uj  ■  1  if  the  1th  component  falls  and  0  If  the  1th  component 
functions,  1  «  l,2,...,n.  Then,  the  state  of  the  system  Is  given  by  a  vector 
u  •  (u-j  ,u2,...,um)»  Uj  *  0,1.  There  are  2m-l  possible  outcomes  In  which 
one  or  more  components  fail  simultaneously.  For  each  u,  let 

tyt)  -  V’XIt*  t>0>  x:>0  (5-1) 

be  the  probability  density  function  of  the  waiting  time  for  the  failure 
combination  u  . 

m  m 

let  w  ■  T  u4  be  the  weight  of  the  vector  u  .  Then  define 
1-1  1 


(5.2) 


(mX  +  y(mpqm'’^),  w(u)  *  1» 

v(If)piqm“1  .  w(u)  »  i  >  i  * 

where  0  <  p  <  1,  q  *  1  -  pt  X  >  0,  y  >  0,  m^2.  Consequently,  we  simplify 
notation,  writing  X~  •  X^,  1  -  l,2,...,m,  where  1  »  w(u). 

Thus,  for  a  parallel  system  of  two  components,  the  system  fails  by  time 
T  if  either  both  of  (0,1)  and  (1,0)  occur  or  If  the  combination  (1,1) 
occurs.  Thus  the  probability  that  the  system  fails  is 

(1  -  e"XlT)2  +  (1  -  e-X*T)  ~  (X,T)2  +  X2T,  (5.3) 

the  approximation  being  obtained  using  the  reasoning  employed  In  deriving 
(3.9).  The  common  failure  rate  is 

*  A.  «  I  X.  .  (5.4) 

+  1-2  1 

A  detailed  discussion  of  this  model  and  procedures  for  statistical  esti¬ 
mation  of  the  parameters  may  be  found  In  C.  L.  Atwood  [2],  The  Intuitive 
justification  for  the  model  (5.1)  and  (5.2)  may  be  stated  as  follows.  The 
Individual  components  have  a  lifetime  distribution  determined  by  the  exponential 
distribution  with  parameter  X  .  Shocks  also  arrive  In  accordance  with  a 
Poisson  process  with  Intensity  y  .  As  each  shock  occurs,  the  Individual 
components  fall  Independently  with  probability  p  .  From  (5.3)  we  see  that 
no  provision  Is  made  for  down  time,  that  is,  it  Is  assumed  that  all  failures 
are  repaired  Instantly. 
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6.  A  Shock  Model 

G.  Apostolakis  [1]  proposed  the  following  model.  Each  of  m  components 
has  an  independent  exponentially  distributed  life  distribution  with  common 
parameter  X  .  In  addition,  shocks  arrive  in  accordance  with  a  Poisson 
process,  stochastically  Independent  of  the  above  random  lifetimes.  This 
Poisson  process  has  parameter  Xc  and  each  shock  Induces  the  simultaneous 
failure  of  all  m  components.  Thus,  there  are  two  possible  modes  of  failure 
of  a  given  set  of  k  s  m  components  before  a  specified  time  T  .  The  k 
components  can  fall  Individually,  In  accordance  with  the  lifetime  distribution 
or  they  can  be  subjected  to  a  shock  inducing  a  common  failure.  Thus,  the 
reliability  of  a  parallel  system  of  k  components  is 


R(T)  -  [l-<1-e'*T>k]te'x'=T]  . 


(6.1) 


Similarly,  for  a  k  of  m  system, 


R(T)  *  e~XcT  l  ("|)e"rXT(l  -  e"XT)m"r 


(6.2) 


Write 


1  -  I  (m)e*rXT(l  -e‘XT)m"r  *  V  (>-rXT(l  -e"XT)m”r.  (6.3) 
r»k  r  r-0  * 


As  XT  -*■  0  » 


l  (m)e'rXT(l  -e’XT)m‘r  •  (k^)e'(k'1)XT(l  -e'XT)m-k+1(l  +  0(XT)) 

r«0  r 

*  (k*1)(U)m-k+,O*0(XT)).  I 


(6.4) 


I 
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Thus  as  XCT  0, 

R(T)  ~  (1  -  XCT)(1  -  (^,)(XT)"'k+1).  (6.5) 

In  this  form.  It  Is  possible  to  determine  how  significant  the  probability 
of  common  failure  Is  relative  to  the  overall  reliability.  Apostolakls  [1] 
gave  a  representation  In  terms  of  the  hazard  function  of  the  k  of  m 
system  lifetime. 


7.  A  Suggested  Common  Failure  Model. 

The  model  described  here  was  proposed  by  the  author  and  Is  motivated 
by  seme  of  his  work  [3]  on  the  stress-strength  models  in  reliability.  This 
model  Is  an  extension  of  the  common  load  model  of  Mankamo  [12,  13]  and  Is  also 
related  to  the  binomial  failure  model,  but  Is  more  fundamental  than  that 
model.  In  a  probabilistic  sense. 

Specifically,  the  model  Is  defined  as  follows. 

Let  N(t)  be  the  number  of  shocks  arriving  on  or  before  t,  0  s  t  sT. 

If  n(t)  shocks  have  arrived  in  [0,T],  we  designate  the  arrival  times  by 

0  <  t,  < ...  <t  <  T.  The  shocks  are  assumed  to  have  random  magnitudes 
i  n 

X(tj),X(t2),...,X(tn);  It  Is  further  assumed  that  X(tj),X(t2)»...»X(tn) 
are  Independent  and  Identically  distributed.  Consider  a  system  of  m 
components.  To  each  component,  we  associate  independent  random  variables 


T| »^2,*,**Ym*  C°mP°nent  f  Is  said  to  fall  at  time  tj  whenever  X(t^)  >  Y^, 
1  ■  1,2,. ..,m.  It  Is  convenient  to  order  (Y-pYg,...^),  replacing  them  by 
the  random  variables  0  ^  *  *[2]  *  5  Y[m]*  Then*  r  components  fall 

simultaneously  at  time  t^  whenever 


25 


Y[r]  <  X(tj}  5  Y[r+ir  r  *  (7.1) 

where  Y[0]  “  °»  Y[m+1]  “  *  * 

Within  this  structure,  several  specializations  which  are  appropriate 
for  a  number  of  potential  applications  can  be  prescribed. 

In  some  Instances,  one  may  wish  to  assume  that  Y[i]»Y[2]****»Y[m] 
are  known.  This  Information  may  be  obtained  from  non-destructive  testing 
or  from  extensive  knowledge  of  physical  properties  of  the  components.  This 
model  Is  closely  related  to  the  model  described  In  J.  D.  Church  and  B. 

Harris  [3], 

Another  modification  of  Interest  Is  the  following.  The  random  variables 
Yj,  1  *  l,2,...,m,  which  represent  the  strengths  or  resistances  of  the  Indi¬ 
viduals  components  are  subject  to  "wear  out."  This  cajrj  be  accomplished  by 
defining  a  family  of  random  functions  Y^(t)  1  ■  l,2,...,m,  where  for 

*1  <t2*  Y1^1^  *  Y i ( ^2)  an<*  Yj(t)  0  as  t  -*•  •  .  The  precise  choice  of 
these  functions  would  require  specific  knowledge  of  the  physical  character¬ 
istics  of  the  components. 

Further,  It  may  be  appropriate  to  assume  that  the  shocks  have  a  degrading 
effect.  That  Is,  If  they  do  not  cause  failure  of  a  component.  It  may  weaken 
that  component  so  that  the  next  shock  will  be  more  likely  to  Induce  a  failure. 
This  may  be  described  by  Introducing  functions  as  follows: 


tytjO  •  H(Y1(tJ),X«tJ))  , 


(7.2) 


where  Y,j(tj+)  s  Yj(tj).  To  characterize  the  functions  (7.2),  engineering 
models  for  fatigue  and  shock  damage  are  needed  and  such  models  will  depend 
on  the  precise  nature  of  the  components. 

Some  specific  illustrations  follow. 

Example  7.1 .  Assume  that  Y-j  *  y-j »  Y2  a  y2’****Ym  *  ym  are  known 

and  that  the  waiting  times  between  shocks  are  Independent  exponentially 
distributed  with  common  parameter  A.  Assume  further  that  X(t-|),X(t2),..., 

X(tm)  are  Independent  Identically  distributed  with  probability  density 
function 

f(Xj(x)  a  3e“SX»  x  >  0,  B  >  0. 

With  no  loss  of  generality,  we  can  assume  y1  <  y2  <  •••<yra*  Then  1et  Zj 
be  the  number  of  components  falling  at  time  tj  .  Accordingly, 

P{Zj  -  r)  *  e’Byr  -  e"Byr+l,  r  »  0,1 . m  ,  (7.3) 

where  yQ  »  0,  ym+1  *  «. 

Then  the  probability  of  a  common  cause  failure  In  [0,T]  is 

PC(T)  -  1  -  P(Z1  s  1,  Z2  s  1,...)  .  (7.4) 

In  (7.4),  there  Is  a  tacit  assumption  that  failed  components  are  "Instantaneously" 
replaced  or  repaired.  Thus, 
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Thus,  for  this  very  special  model,  one  does  get  a  "nice"  answer. 

One  can  extend  (7.4)  and  (7.5)  easily  to  calculate  the  probability  that 
1  or  more  components  fall  simultaneously. 


8.  Concluding  Remarks. 

The  present  report  described  several  possible  models  for  common  failures, 
one  of  which  Is  believed  to  be  new.  With  the  exception  of  the  square  root 
bounding  method,  all  appear  to  be  plausible  models  and  presumably  can  be 
regarded  as  approximations  to  reality  on  probabilistic  grounds,  under  suitable 
physical  conditions.  Consequently,  one  now  needs  to  extend  the  probabilistic 
models  described  herein  to  systems  commonly  encountered  In  practice.  Then 
one  should  compare  the  models  with  existing  data  on  common  failures.  Finally, 
statistical  Inference  for  these  models  needs  to  be  studied.  These  investi¬ 
gations  are  to  be  considered  In  future  reports. 
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